Take your climate visualizations to the next level!
Author
Fanni Varhelyi
Introduction
Have you ever presented a visualization and wished it would react with interactive elements or hover key information as you discussed its elements? How about fully animated charts, walking audiences through time or space? If you already have a basic understanding of data visualization in R and the ggplot package, achieving these are actually not that difficult. Mastering them, as with any data visualization, can be more of a challenge. The aim of this short tutorial is to showcase a couple of simple but effective data visualization tools and packages that you can leverage for interactivity and animations. Beyond providing an example and starter code, I will also talk about what differentiates a quick solution from a good one, and show some applications.
To start with, I will introduce two easy ways to make plots interactive: ggplotly and Plotly (both based on the Plotly library). Then, I will show you how to animate plots to visualize data cross time or space. For all of these examples, I will leverage climate data to show how these tools and tips can help take your climate messaging to the next level.
There are many more options to choose from, showcased by the R graph gallery. Check them out for more fun ideas!
Data: what are we working with?
It is always a good practice to understand your data: what’s included, is there anything missing, what’s the unit of analysis. For our tutorial, we will leverage data from two sources. The main data source is Climate TRACE emissions data that tracks two things: per country per year emissions, and per industry per year emissions between 2015-2021. The dataset has been already preprocessed: it summarizes emissions per year, per country, and per sub-sector, and all emissions (CO2, CH4 and N2O) are in million tons.
The other source of information I will leverage is the gapminder dataset. Gapminder produces its own visualizations and provides information on various topics; its data is also accessible for free. I downloaded the population v7 dataset for this exercise, as the Gapminder dataset natively available in R does not contain population information for the time period we’re interested in.
We can quickly extract the population information per year and add this to our climate trace data with a merge. Finally, for an added data analysis aspect, we can also divide the emissions with the population to get the per capita emissions that might provide a useful additional view. With that done, we have our data and can start plotting!
Code
#Merge the two datasets## First we need to ensure we have the same datatype for matching:gapminder$time <-as.integer(gapminder$time)##Then we can mergedf <-inner_join(climate_trace, gapminder, by =c("country"="name", "year"="time"))##Let's also create a per capita view on CO2 emissions. This will be in tonnes, not in million tonnes. We also need to change Population to a number while at it.df <- df %>%mutate(Population =as.numeric(Population)) %>%mutate(co2_capita = co2 / Population *1000000)## Let's look at the results: check for NAs (if there are any, we might have different naming of countries in the datasets), then display some of the results in a table.print(paste('Are there any missing values in the table?', any(is.na(df))))
[1] "Are there any missing values in the table? FALSE"
Code
kable(head(df))
country
year
sector.subsector
co2
ch4
n20
geo
Population
co2_capita
Afghanistan
2015
agriculture
0.0258718
0.4491568
0.0129171
afg
33753499
0.0007665
Afghanistan
2015
buildings
0.4001354
0.0055361
0.0000360
afg
33753499
0.0118546
Afghanistan
2015
fluorinated-gases
0.0000000
0.0000000
0.0000000
afg
33753499
0.0000000
Afghanistan
2015
fossil-fuel-operations
0.4387276
0.0330541
0.0000124
afg
33753499
0.0129980
Afghanistan
2015
manufacturing
2.6236714
0.0002453
0.0000305
afg
33753499
0.0777304
Afghanistan
2015
mineral-extraction
0.0007360
0.0000000
0.0000000
afg
33753499
0.0000218
Interactivity in R
For the interactive visualization, I will create a line chart of yearly country-level emissions.
Let’s create a simple visualization of our data using ggplot. For this visualization, we will filter to the top 10 emitting countries as all countries would be overwhelming. Also, we will focus on absolute CO2 emissions.
The static visualization looks like this:
Code
#Identify the top10 emittors based on CO2filter <- df %>%group_by(country) %>%summarize(total_emissions =sum(co2, na.rm =TRUE)) %>%arrange(desc(total_emissions)) %>%slice_head(n =10) %>%select(country)#Filter our data based on these 10 countiresfiltered_data <- df %>%semi_join(filter, by ="country") %>%group_by(country, year) %>%summarize(co2 =sum(co2, na.rm =TRUE),ch4 =sum(ch4, na.rm =TRUE),n20 =sum(n20, na.rm =TRUE), ) %>%ungroup()ggplot(filtered_data, aes_string(x ='year', y ='co2', group='country')) +geom_line(aes(color=country)) +geom_point(aes(color=country)) +scale_color_brewer(palette ="PuOr") +labs(y ='CO2', x ="Year", color ='Country', caption ='Source: Climate Trace', title='Country-level carbon emissions') +theme_minimal() +scale_y_continuous(labels =function(x) format(x, scientific =FALSE)) +scale_x_continuous(breaks =unique(filtered_data$year)) +theme(legend.position="bottom",legend.title =element_text(size =14),legend.text =element_text(size =12),axis.text.x =element_text(color ="grey20", size =12),axis.text.y =element_text(color ="grey20", size =12),axis.title.x =element_text(size=14),axis.title.y =element_text(size=14),plot.caption =element_text(size=10) )
Maybe per capita emissions would be more informative? Let’s modify our code to check:
Code
#Identify the top10 emittors based on CO2filter <- df %>%group_by(country) %>%summarize(total_emissions =sum(co2, na.rm =TRUE)) %>%arrange(desc(total_emissions)) %>%slice_head(n =10) %>%select(country)#Filter our data based on these 10 countiresfiltered_data <- df %>%semi_join(filter, by ="country") %>%group_by(country, year) %>%summarize(co2_capita =sum(co2_capita, na.rm =TRUE),co2 =sum(co2, na.rm =TRUE),ch4 =sum(ch4, na.rm =TRUE),n20 =sum(n20, na.rm =TRUE), ) %>%ungroup()p <-ggplot(filtered_data, aes_string(x ='year', y ='co2_capita', group='country')) +geom_line(aes(color=country)) +geom_point(aes(color=country)) +scale_color_brewer(palette ="PuOr") +labs(y ='CO2 per capita', x ="Year", color ='Country', caption ='Source: Climate Trace', title='Country-level carbon emissions') +theme_minimal() +scale_y_continuous(labels =function(x) format(x, scientific =FALSE)) +scale_x_continuous(breaks =unique(filtered_data$year)) +theme(legend.position="bottom",legend.title =element_text(size =14),legend.text =element_text(size =12),axis.text.x =element_text(color ="grey20", size =12),axis.text.y =element_text(color ="grey20", size =12),axis.title.x =element_text(size=14),axis.title.y =element_text(size=14),plot.caption =element_text(size=10) )p
This provides us with a slightly different picture. Let’s stay with per capita Carbon Dioxide emissions. To make this plot interactive, we can just wrap it in ggplotly():
Code
ggplotly(p)
It’s that simple! However, we might want to further customize some aspects of the plot to move from simply interactive to a well designed template. Let’s focus on two key elements:
What interactive elements do we need?
How and what information to hover?
Plotly can select specific elements of the plot or zoom in and out. With our plot, however, these are functions we don’t need, and the hovering toolbar might be disruptive. If a user accidentally zooms into the visualization, it might confuse them. So, let’s disable zooming and hide the toolbar. It’s also an option to hide some elements, but not all from the toolbar if that works better for a given use case.
Next, plotly can be used to modify elements of the plot. I will set the background to a light gray with white grid lines. This is similar to how ggplot works, but it’s useful to know in case you already have a ggplotly object and want to modify that without going back to the original plot.
Finally, let’s modify the hover information. The easiest way is to format our data before plotting for a better output. There’s also a possibility to add additional information into the hover text: for example, we might be curious about total CO2 emissions when looking at the per capita values. Let’s format our data better and add total CO2 information to the hovertext.
To achieve full customization, we can also re-create our plot with plotly from zero. Plotly can create graphs from scratch, and this offers more customization options than using the ggplot version. However, it’s worth noting that ggplotly plots can also be configured - the hovering information is one section where native plotly shines. It is also more dynamic when created from scratch: while we can isolate each line on the ggplot version as well, the plotly-based plot dynamically changes the y axis based on our selection, which can be a useful aspect of showcasing our results.
For this type of plot, however, this is more complicated: we need to add each line separately and color them separately. Let’s see an example with the top10 emitters.
Code
plotly_data <- filtered_data %>%select(country, year, co2_capita, co2) %>%rename(`Carbon-Dioxide`= co2, `CO2 per capita (ton)`= co2_capita) %>%mutate(`Carbon-Dioxide`=round(`Carbon-Dioxide`, 2),`CO2 per capita (ton)`=round(`CO2 per capita (ton)`, 2)) %>%pivot_wider(names_from = country, values_from =c(`CO2 per capita (ton)`, `Carbon-Dioxide`),names_sep =" ")fig <-plot_ly(plotly_data, x =~year, y =~`CO2 per capita (ton) Canada`, name ='Canada', type ='scatter', mode ='lines+markers', line =list(color='#7F3B08'), marker =list(color='#7F3B08')) %>%add_trace(y =~`CO2 per capita (ton) China`, name ='China', type ='scatter', mode ='lines+markers', line =list(color='#B35806'), marker =list(color='#B35806')) %>%add_trace(y =~`CO2 per capita (ton) India`, name ='India', type ='scatter', mode ='lines+markers', line =list(color='#E08214'), marker =list(color='#E08214')) %>%add_trace(y =~`CO2 per capita (ton) Japan`, name ='Japan', type ='scatter', mode ='lines+markers', line =list(color='#FDB863'), marker =list(color='#FDB863')) %>%add_trace(y =~`CO2 per capita (ton) Saudi Arabia`, name ='Saudi Arabia', type ='scatter', mode ='lines+markers', line =list(color='#FEE0B6'), marker =list(color='#FEE0B6')) %>%add_trace(y =~`CO2 per capita (ton) United States of America`, name ='United States', type ='scatter', mode ='lines+markers', line =list(color='#D8DAEB'), marker =list(color='#D8DAEB')) %>%add_trace(y =~`CO2 per capita (ton) Germany`, name ='Germany', type ='scatter', mode ='lines+markers', line =list(color='#B2ABD2'), marker =list(color='#B2ABD2')) %>%add_trace(y =~`CO2 per capita (ton) Iran`, name ='Iran', type ='scatter', mode ='lines+markers', line =list(color='#8073AC'), marker =list(color='#8073AC')) %>%add_trace(y =~`CO2 per capita (ton) Russia`, name ='Russia', type ='scatter', mode ='lines+markers', line =list(color='#542788'), marker =list(color='#542788')) %>%add_trace(y =~`CO2 per capita (ton) South Korea`, name ='South Korea', type ='scatter', mode ='lines+markers', line =list(color='#2D004B'), marker =list(color='#2D004B')) %>%layout(title='Country-level carbon emissions',xaxis =list(title ="Year"),yaxis =list (title ="CO2 per capita (tonnes)"),hovermode ='x unified') %>%config(fig, scrollZoom =FALSE, displayModeBar =FALSE)fig
Alternatively, we could create any custom hover information or even differentiate it per country. The Plotly documentation outlines all the possibilities.
So, to recap, we’ve seen how Plotly can be used in two ways to create interactive visualizations. First, it can be used as a quick method to convert ggplots into interactive plots. This is a very fast and easy method, but it also offers less customization. We could also use Plotly to build up a graph, which offers the full variety of interactivity, but the drawback is we need to learn how to use a vizualization library that has a very different logic compared to ggplot.
Interactivity itself might have its drawbacks. While users can interact more with these plots, if we’re presenting, it might be easier to create a static plot that displays exactly what our takeaway is. For instance, if our takeaway is how per capita emissions have been increasing in Russia especially in 2020, we could highlight only that color and add text labels on a static plot. Here, we would have to navigate there and explain - on the other hand, once there, we have the option to display more nuanced information as well. Interactivity is especially useful for as dashboards, where users can create their own views.
Animate your graphs for a more dramatic effect
Let’s talk about animating plots. This can be a powerful tool in a data scientist visualization arsenal: an animated graph tells a story on its own. On the other hand, just like with interactivity, if we want to focus on a specific point or specific changes, it might be better to highlight this on a static plot. Animated plots are especially useful to show changes over time, especially with many moving parts and a longer time frame.
For our purposes, we will leverage the gganimate library. As with ggplotly, we can leverage our existing ggplot skills for these animations. Let’s again create a static plot with ggplot for illustration. For this example, we will look at yearly global sector-wide emissions of CO2.
To add interactivity, we can reuse the same plot, but instead of the total, we will see the yearly changes. To do so, we only need to add transition_time() and ease_aes(). The graph title can also be updated to change dynamically. Adding a text layer also helps visualize the changes.
Code
ggplot(sector_data, aes(x =reorder(sector, co2), y = co2)) +geom_bar(stat ="identity", fill='steelblue') +coord_flip() +geom_text(aes(label=format(round(co2,2)),y = co2 +1300),stat ="identity", size =4) +labs(y ='Carbon-Dioxide (M tonnes)', x ='', caption=' Source: Climate Trace', title='Sector-wide global carbon emissions in {frame_time}') +theme_minimal() +theme(axis.text.x =element_text(color ="grey20", size =12),axis.text.y =element_text(color ="grey20", size =12),axis.title.y =element_text(size=14),plot.caption =element_text(size=10) ) +transition_time(year) +ease_aes('linear')
As can be seen, this package is simple to use with ggplot and does not require us to learn a completely new visualization syntax, like Plotly did. Since ggplot is already a powerful library with many interesting options, gganimate can incorporate all of them, allowing for custom results.
One drawback of this type of animation is it really depends on the data to work. Even if we choose an appropriate visualization element (bar chart in this case), if the changes are not overt enough (as in this case), the animated visualization can have a small effect. It probably would be better to use it with a dataset where changes are more profound - on the other hand, it also showcases quite well the message, which could be ‘There’s no significant change in the amount of carbon top polluting industries emit’.
I hope you found this tutorial useful! Both ggplotly and gganimate are simple to implement and when appropriate, they create a substantial added effect for any visualization. Plotly has even more potential for interactivity, but it’s a bit more difficult to use. R has multiple similarly useful visualization tools, if you liked these, check out a few more at the R graph gallery!